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Abstract 

We demonstrate that there is an intimate relationship between the magnetic properties of 
Dcrrida's random energy model (REM) of spin glasses and the problem of joint source-channel 
coding in Information Theory. In particular, typical patterns of erroneously decoded messages 
in the coding problem have "magnetization" properties that are analogous to those of the REM 
in certain phases, where the non-uniformity of the distribution of the source in the coding prob- 
lem, plays the role of an external magnetic field applied to the REM. We also relate the ensemble 
performance (random coding exponents) of joint source-channel codes to the free energy of the 
REM in its different phases. 

Keywords: spin glasses, REM, phase transitions, magnetization, information theory, joint 
source-channel codes. 



1 Introduction 

In the last few decades it has become apparent that many problems in Information Theory, and 
coding problems in particular, can be mapped onto (and interpreted as) analogous problems in the 
area of statistical physics of disordered systems, most notably, spin glass models. Such analogies are 
useful because physical insights, as well as statistical mechanical tools and analysis techniques (like 
the replica method), can be harnessed in order to advance the knowledge and the understanding 
with regard to the information-theoretic problem under discussion (and conversely, information- 
theoretic approaches to problems in physics may sometimes prove useful to physcists as well). A 
very small, and by no means exhaustive, sample of works along this line includes references [1]— [25]. 
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In particular, Sourlas [8], [9] was the first to observe that there are strong analogies and parallisms 
between the behavior of ensembles of error correcting codes and certain spin glass models with 
quenched parameters, like the p-spin glass model and Derrida's random energy model (REM) 
[26|.|27].|28] at least as far as the mathematical formalism goes. In particular, the REM is an 
especially attractive model to adopt in this context, as it is, on the one hand, exactly solvable, and 
on the other hand, rich enough to exhibit phase transitions. As noted in (5j Chap. 6] and |16j . 
ensembles of error correcting codes 'inherit' these phase transitions from the REM when viewed 
as physical systems whose phase diagram is defined in the plane of the coding rate vs. decoding 
temperature. In [29j this topic was further investigated and ensemble performance figures of error 
correcting codes (random coding exponents) were related to the free energies in the various phases 
of the phase diagram. 

While the above-described relation takes place between pure channel coding and the REM 
without any external magnetic field, in this work, we demonstrate that there are also intimate 
relationships between combined source/channel coding and the REM with such a magnetic field. In 
particular, it turns out that typical patterns of erroneously decoded messages in the source/channel 
coding problem have "magnetization" properties that are analogous to those of the REM in certain 
phases, where the non-uniformity of the distribution of the source in the joint source-channel 
coding system, plays the role of an external magnetic field applied to the spin glass modeled by the 
REM. We also relate the ensemble performance (random coding exponents) of joint source-channel 
codes to the free energy of the REM in its different phases. 

The outline of this paper is as follows. In Section 2, we provide some background, both on the 
information theoretic aspect of this work, which is the problem of joint source channel coding, and 
the statistical mechanical aspect, which is the REM and its magnetic properties. In Section 3, we 
present the phase diagram pertaining to finite-temperature decoding of an ensemble of joint source- 
channel codes and characterize the free energies in the various phases. Finally, in Section 4, we 
derive random coding exponents pertaining to this emsemble and demonstrate their relationships 
to the free energies. 
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2 Background 



In this section, we give some very basic background which will be needed in the sequel. In Subsec- 
tion 2.1, we provide a brief overview of Shannon's fundamental coding theorems, the skeleton of 
Information Theory: The source coding theorem, the channel coding theorem, and finally the joint 
source-channel coding theorem. In Subsection 2.2, we review a few models of spin glasses, with 
special emphasis on the REM. 

2.1 Information Theory 

2.1.1 Source Coding 

Suppose we wish to compress a sequence of N bits, (ui, v,2, ■ ■ ■ , un), drawn from a stationary 
memoryless binary source, i.e., each bit is drawn independently, where Pr{uj = 1} = 1 — Prjiij = 
0} = q. Shannon's source coding theorem (see, e.g., |30|, Chap. 5]) tells that if we demand that 
the source sequence would be perfectly reconstructable from the compressed data, then the best 
achievable compression ratio (i.e., the smallest average ratio between the compressed message length 
and the original source message length - N), at the limit of large N, is given by the entropy of the 
source, which in the binary memroyless case considered here, is given by: 

%) = -<?log 2 q-(l~q) log 2 (l - g). 

Many practical coding algorithms are known to achieve h(q) asymptotically, e.g., Huffman coding, 
Shannon coding, arithmetic coding, and Lempel-Ziv coding, to name a few [30 . 

2.1.2 Channel Coding 

Shannon's celebrated channel coding theorem (see, e.g., [30|. Chap. 7]) is about reliable transmission 
of digital information across a noisy channel: Suppose we wish to transmit a binary messsage of 
k bits, indexed by m (0 < m < 2 k — 1), through a noisy binary symmetric channel, which flips 
the transmitted bit with probability p or conveys it unaltered, with probability 1 — p. If we wish 
to convey the message via the channel reliably (i.e., with very small probability of error), then 
before we transmit the message via the channel, we have to encode it, i.e., map it in a sophisticated 
manner into a longer binary message of length n (n > k) and then transmit the encoded message 
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x(m) = (xi(m), . . . , x n {m)). The ratio R = k/n is called the coding rate. It measures how 
efficiently the channel is used, i.e., how many information bits are conveyed per one channel use. 
The corresponding channel output sequence, y = (yi, . . . , y n ) (with some of the bits flipped by the 
channel), is received at the decoder. 

The optimum decoder, in the sense of minimum probability of error, estimates the message m 
by the maximum a-posteriori (MAP) decoder, i.e., it selects the message < m < 2 k — 1 which 
maximizes posterior probability given y, that is, P(m\y), or equivalently, it maximizes the product 
P(m)P(y\x(m)), where P{m) the prior probability of message m and P(y\x(m)) is the conditional 
probability of the observed y given that x(m) was transmitted. In the important special case where 
all messages are a-priori equiprobable, that is, P(m) = 2~ k for all m, the MAP decoding rule boils 
down to the maximization of P(y\x(m)), which is the maximum likelihood (ML) decoding rule. 

Channel capacity C is defined as the supremum of all coding rates R for which there still exist 
encoders and decoders which make the probability of error arbitrarly small provided that n is large 
enough (keeping R = k/n fixed). Shannon's channel coding theorem provides a formula of the 
channel capacity, which in the binary case considered here, is given by 

(7=1- h(p) = 1 + p\og 2 p + (1 -p) log 2 (l -p). 

One of the mainstream efforts in the Information Theory literature has evolved around devising 
practical coding and decoding schemes, in terms of computational complexity and storage, with 
rates close to capacity. 

2.1.3 Joint Source Channel Coding 

Finally, we consider the problem of joint source- channel coding (see, e.g., [30l Sect. 7.13]): Suppose 
we have a binary memoryless source, as in the first paragraph above, and a binary memoryless 
channel, as in the second paragraph above. We assume that by the time that the source generates 
N symbols, the channel can transmit n = NO bits {0 > is fixed). 

A joint source-channel code maps the source sequence u = (ui, . . . , itjv) of length iV into a 
channel input sequence x(u) of length n. The decoder, that receives the channel output vector 
y, estimates u either by the symbol MAP decoder, which minimizes the symbol error probability 
(or the bit error probability) or the word MAP decoder, which as mentioned earlier, minimizes 
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the word error probability. The word MAP decoder works similarly to the above described MAP 
decoder for a channel code: It estimates the source sequence as a whole by seeking the vector u 
that maximizes P(u)P(y\x(u)), where P(u) is the probability of the source vector u. The symbol 
MAP decoder, on the other hand, estimates each bit it, of the source separately by seeking the 
symbol u £ {0, 1} that maximizes Pr{«j = u,y} = ^2 U . u . =u P(u)P(y\x(u)), i = 1, . . . , N. 

These two decoders can be thought of as two special cases of a more general class of decoders, 
referred to as finite-temperature decoders |18j . A finite-temperature decoder estimates the i-ih 
symbol Ui by 

Ui = argmax ue{01} ^ [P(u)P(y\x(u))f , 

11: Ui=u 

where the parameter (5 can be thought of as an inverse temperature parameter. The choice (3 = 1 
corresponds to the symbol MAP decoder, whereas (3 — > oo gives us the word MAP decoder [5j 
Chap. 6]. 

The joint source-channel coding theorem asserts that a necessary and sufficient condition for 
the existence of codes, that for large enough n and N (with 9 = n/N fixed), u can be decoded with 
aribrarily small probability of error (both wordwise and symbolwise) is given by 

h(q) < 6C. (1) 

One approach to achieve reliable communication, whenever this condition holds, is to apply separate 
source coding and channel coding: First compress the source to essentially h{q) bits per symbol, 
resulting in a binary compressed message of length about Nh(q) = nh{q)/6 bits, as described in 
the first paragraph above, and then use a reliable channel code of rate R = h(q)/9 < C to convey 
the compressed message, as described in the second paragraph. The decoder will first decode 
the message by the corresponding channel decoder and then decompress the resulting message. 
Another approach is to map u directly to a channel input vector x{u). It can be shown |311 
Exercise 5.16, p. 534] that by a random selection of a code from the uniform ensemble (i.e., by 
generating each codeword x(u), u 6 {0, 1}^, independently by a sequence of n fair coin tosses), 
the average probability of error, over this ensemble of codes, tends to zero as the block length goes 
to infinity, as long as the above necessary and sufficient condition holds. 
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2.2 The REM 



Consider a spin glass with n spins, designated by a binary vector a = (a\, . . . , a n ), &i € {—1, +1}, 
i = 1,2, ... ,n. The simplest model of this class is that of a paramagnetic solid, namely, the one 
where the only effect is that of the external magnetic field H, whereas the effect of interactions is 
negligible (cf. |32} Chap. 3]). Assuming that the spin directions are all either parallel or antiparallel 
to the direction of the external magnetic field, the energy associated with a configuration a is given 
(in the appropriate units) by: 

n 
i=l 

which means (according to the Boltzmann distribution) that each spin is independently oriented up- 
ward (+1) with probability e@ H /[2 cosh((3H)] or downward (-1) with probability e~^ H /[2cosh(f3H)]. 
This means that the average (net) magnetic moment is 

and so the average internal energy per particle is E = — H tanh(/3i?) and the free energy per particle 
is F = -^ln[2cosh(/3iI)]. 

More involved (and more interesting) situations occur, of course, when the effect of mutual 
interactions among the spins is appreciable. The simplest model that accounts for interactions is 
the Ising model, given by 

n 

£(a) = -J^aiO-j - H^ai (3) 

i,j i=l 

where the second term is the contribution of the external magnetic field as before, and the in the first 
term, pertaining to the interaction, J describes the intensity of the interaction with the summation 
being defined over pairs of neighboring spins (depending on the geometry of the problem). 

More general models allow interactions not only with immediate neighbors, but also with more 
distant ones, and then there are different strengths of interaction, depending on the distance between 
the two spins. In this case, the first term is replaced, by the more general form — ^ . JijO~i(Tj, where 
now the sum can be defined over all possible pairs {(i,j)}. Here, in addition to the ferromagnetic 
case, where all Jj,- > 0, and the antiferromagnetic case, where all Jjj < 0, there is also a mixed 
situation where some Jij are positive and others are negative, which is the case of a spin glass. 
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Here, not all spin pairs can be in their preferred mutual position (parallel/antiparallel), thus the 
system may be frustrated. 

To model situations of disorder, it is common to model {Jij} as random variables (RV's) with, 
say, equal probabilities of being positive or negative. For example, in the Edwards-Anderson (EA) 
model [33], {Jij} are taken to be i.i.d. zero-mean Gaussian RV's when i and j are neighbors and 
set to zero otherwise. In the Sherrington-Kirkpatrick (SK) model [33]) all {Jij} are i.i.d., zero- 
mean Gaussian RV's. In the p-spin-glass model, the interaction terms consist of all products of 
combinations of p spins (rather than just pairs) with Gaussian coefficients of the appropriate scaling 
(cf. e.g., [35]). 

In all these models, the system has two levels of randomness: the randomness of the interaction 
coefficients and the randomness of the spin configuration given the interaction coefficients, according 
to the Boltzmann distribution. However, the two sets of RV's are normally treated differently. The 
random coefficients are commonly considered quenched RV's, namely, they are considered fixed in 
the time scale at which the spin configuration may vary. This is analogous to the model of coded 
communication in a random coding paradigm: A randomly drawn code should normally be thought 
of as a quenched entity, as opposed to the randomness of the source and/or the channel. 

2.2.1 The REM in the Absence of a Magnetic Field 

In [26j,[27J,[28j, Derrida took the above described idea of randomizing the (parameters of the) 
Hamiltonian to an extreme, and suggested a model of spin glass with disorder under which the 
energy levels {£{&)} are simply i.i.d. RV's, without any structure in the form of ([3]) or its above- 
described extensions. It can also be viewed, however, as the asymptotic behavior of the p-spin-glass 
model when p —* oo (a limit to be taken after the limit n — > do, i.e., p « n) [35]. In particular, 
in the absence of a magnetic field, the 2 n RV's {£{a)} are taken to be i.i.d., zero-mean Gaussian 
RV's, all with variance nJ 2 /2, where J is a parameter^ The beauty of the REM is in that on 
the one hand, it is very easy to analyze, and on the other hand, it consists of sufficient richness to 
exhibit phase transitions. 

lr The variance scales linearly with n to match the behavior of the Hamiltonian with a limited number of 
interacting neighbors and random interaction parameters, which has a number of independent terms that is linear in 
n. 
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The basic observation about the REM is that for a typical realization of the configurational 
energies {£(&)}, the density of number of configurations with energy about E (i.e., between E and 



E + dE), N(E), is proportional (up to sub-exponential terms in n) to 2 n • e 



-E 2 /{nJ 2 ) 



, as long as 



\E\ < Eq = nJ\/\n 2, whereas energy levels outside this range are typically not populated by spin 
configurations (N(E) = 0), as the probability of having at least one configuration with such an 
energy decays exponentially with n. Thus, the asymptotic (thermodynamical) entropy per spin, 
which is defined by 



S(E) 



lim 

n— >oo 



\nN(E) 



■n 



is given by 



S(E) 



ln2-(£) 2 \E\<E 
\E\ > E 



-oo 



The partition function of a typical realization of a REM spin glass is then 

r-Eo 

Z((3) 



dE ■ N(E) ■ e~P E 

-En 



E 
-E 



dE . e nS ^ ■ e-^ 



(4) 



where the notation = designates asymptotic equivalence between two functions of n in the expo- 
nential scaled The exponential growth rate of Z(/3), 



behaves according to 



max 

\E\<E 



max 
\E\<E 



S(E)-P-- 
n 



In 2 



h) - " 



E 
nJ 



(5) 



Solving this simple optimization problem, one finds that 4>{{3) is given by 



In 2 + 



/3 2 J 2 



< jV^2 



fVln2 



^ /Win 2 p> 

which means that the asymptotic free energy per spin, a.k.a. the free energy density, is given by 
(cf. Proposition 5.2]): 

F(f3) = 



In 2 



/3<4Vln2 



0> 



|VhT2 



2 More precisely, a„ = b n means that lim„_oo ^ log 2a = 0. 
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Thus, the free energy density is subjected to a phase transition at the inverse temperature f3 c = 

2 



j\/ln 2. At high temperatures (j3 < f3 c ), which is referred to as the paramagnetic phase, the partition 
function is dominated by an exponential number of configurations with energy E = -n(3J 2 /2 and 
the entropy grows linearly with n. When the system is cooled to (3 = f3 c and beyond, which is 
the glassy phase, the system freezes but it is still in disorder - the partition function is dominated 
by a subexponential number of configurations of minimum energy E = —Eq. The entropy, in this 
case, grows sublinearly with n, namely the entropy per spin vanishes, and the free energy density 
no longer depends on (3. Further details about the REM can be found in [5j and the references 
mentioned in the Introduction. 

2.2.2 The REM in the Presence of a Magnetic Field 

The random energy levels of the REM, as described above, represent the interaction energies among 
the various spins in the absence of an external magnetic field. In the presence of an external uniform 
magnetic field, H (cf. |26].|27].|28|). the Hamiltonian of the system should be supplemented with 
the term —HY^i&i = —nm(a)H (cf. eq. ([3|)), where 

m(a) = -X>— £>{* = - ^ = = — " 1 

t=i i=i 

is the magnetization associated with the configuration a, and n\{a) is the number of spins up, 
Y^i=i l{ a i = !}• A s far as the statistical description of the REM goes, this shifts the expectation 
of the random energy level £(cr) from zero to —nm{a)H . Equivalently, we can assign the same 
zero-mean Gaussian distribution as before to the interaction energy, call it now £i(cr), and add to 
each configuration a the term —nm(a)H . The corresponding partition function would then be: 



Z(P,H) = J2 e ~ 

= E 



•/3[£/(o-)-nTO(<T).ff] 



a: m{o)=m 

i "£af3, m y nmH (6) 

m 

where (,{(3,m), referred to as the partial partition function, contains only the contributions of 
configurations whose magnetization m(a) is equal to m. The behavior of £(/?, m) is exactly like 
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that of the REM without a magnetic field, except that instead of 2 n configurations, it has only 
| {a : m(a) = m}\ = 2 n ' l (( 1 + m )/ 2 ) configurations, where h(-) is the binary entropy function. By car- 
rying out a similar analysis as in the previous subsection to £(/?, m) and then finding the dominant 
contribution of m (which is the typical magnetization), one can show (cf. [26],[27J,[28|) that there 
exists a phase transition at (3 = f3 c (H), where (3 C {H) is the unique solution to the equation 



f3 2 J 2 = Ah 



1 + tanh(/3#) \ 



2 J 

It is not difficult to see that (3 C {H) is a non-increasing function of \H\ and therefore T C (H) = 
1//3 C (H) is non-decreasing, with a minimum at H = 0, given by T c (0) = J / (2\/m 2) (see Fig. 1). 
For high temperatures {(3 < f3 c (H)), where the effect of the interactions among the spins is relatively 
insignificant, one observes the ordinary paramagnetic behavior, with the average magnetization is 



m 



m p ((5,H) =tanh(/?tf), 



whereas for low temperatures ((3 > (3 C (H)), the system is frozen in the spin glass phase where the 
magnetization no longer depends on the temperature: 



m 



m g (H) =tanh(/3 c (#) ■ H). 



The free energy per spin is given by 
hxZ(J3,H) 



F((3,H) 



lim 

n— >oo 



n(3 



PJ 2 , M[l+tanh(/?g)]/2) 







+ Htanh(pH) 



JJh 



l+ta,nh(p c (H)H) 



+ H tanh(f3 c (H)H) 



(3 < f3 c (H) 
(3 > Pc(H) 



As can be seen, no sponteneous magnetization takes place under the REM, even at low temperatures 
(H — > implies m — > 0). As for other thermodynamic quantities, we have the average internal 
energy per spin 



E(P,H) = ^[PF(P,H)] = \ -^anh(^) 



PJ 2 

2 



< (3c(H) 



the entropy per spin 

S((3,H) = /3[E(P,H)-F((3,H)] 
and the magnetic susceptibility 

X 



H tanh(/? c (#) • H) - (3 > (3 C {H) 

I h ^ l+tanh(/3g) ^ _ (PJ 



13 < (3 C {H) 



dm 




OH 


H=0 l 



(3 [3 < f3 c (H) 
f3c(H) P>(3 C (H) 
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Figure 1: Phase diagram of temperature vs. magnetic field. 



2.3 Joint Source— Channel Code Ensembles and the REM in a Magnetic Field 

In this subsection, we analyze the behavior of a finite-temperature decoder for a typical randomly 
selected code using the tools of the analysis of the REM in a magnetic field. Using the viewpoint 
of the magnetic properties of the REM, it will be seen that the source bits play the role of spins in 
a magnetic field whose intensity is H = \ In where q is the probability that Ui = 1 for each %. 
Accordingly, instead of the binary alphabet {0, 1} that we used before, it will prove more convenient 
to let each Ui assume values in {— 1,+1}. Another slight change in notation, that will take place 
mostly for the sake of convenience, is that instead of defining channel capacity and coding rates 
in terms of bits, we will define them in units of nats, where 1 nat = In 2 bits. This means that 
logarithms will be taken to the natural basis e rather than the base 2. Accordingly, h(q) will be 
redefined hereafter as h(q) = — qln q — (1 — c/)ln(l — q) and the capacity of the binary channel 
considered in Section 2 will be redefined as C = In 2 — h(q) = In 2 + qlnq + (1 — q) ln(l — q). 

Consider then a binary memoryless source sequence, u±, U2, ■ ■ ., £ { — 1, +1}, with a parameter 
q = Prjiij = 1} and a binary symmetric channel with parameter p, which as described in Section 2, 
is assumed to operate 9 times faster than the source, in other words, the channel transmits 9 bits 
during the time that the source generates one bit. The number 9 is a positive real which will be 
assumed fixed throughout the sequel. Consider a joint source-channel code that receives a source 
vector of length N, u = (u\, . . . , un), and produces a channel input vector x(u) of length n = N9. 
The block encoder is generated by random selection: We randomly draw 2^ binary n-vectors, 
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{x(u), u G {— independently, by fair coin tossing. As described in Section 2, when the 
input to the encoder is u, the encoder transmits the corresponding codeword x = x(u), and the 
decoder, upon receiving the channel output y, applies a finite-temperature decoder 

Ui = argmax ue{ _ 1>+1} [P(u)P(y\x(u))f , i = 1, 2, . . . , N. 

U: Ui=u 

We can think of this decoder as a symbol MAP decoder pertaining to a posterior distribution given 
by 

p ( \ \ [P(u)P(y\x(u))f 

P{ lV) Y,u\nu')P{y\x{W))f 
e -p\n[i/P{u)P{y\x{u))] 
sr u , e -PH^/P{u)P(y\x(u))} ( 7 ) 

where in the second line we presented this distribution in the form of the Boltzmann-Gibbs dis- 
tribution with an Hamiltonian given by ln[l/ P(u)P(y\x(u))] (see also e.g., [5j Chap. 6]). The 
corresponding partition function is then 

u 

= [P(u )P(y\x(u ))f + [P(u)P{y\x{u))f 

u^u 

± Z c (f3) + Z e (P), (8) 

where we have separated the partition function into two contributions: Z c {(3), corresponding to 
the correct source sequence uq that was actually generated by the source and fed into the encoder, 
and Z e (P) corresponding to all other possible messages. Now, since typically, the source produces 
sequences with about Nq occurrences of +1 and iV(l — q) occurrences of -1, and the channel 
flips about np out of n of the transmitted bits, Z c (f3) is typically around e~ N/3h ^ ■ e~ n/3h ^ = 
e -N/3[h(q)+eh(p)]^ Q n ^ e other hand, as we will show now, Z e (/3) behaves like the REM in a 
magnetic field whose intensity is 

1 n 
H = - In 



2 1 - q 

Accordingly, we will henceforth denote Z e {(3) also by Z e (f3,H), to emphasize the analogy to the 
REM in a magnetic field. 

To see that Z e {(3,H) behaves like the REM in a magnetic field, consider the following: first, 
denote by N\(u) the number of +l's in u, so that the magnetization, m(u) = jj[YliLi^{ u i = 
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+1} — l{ui = —1}], pertaining to spin configuration u, is given by m{u) 

Equivalently, N\{u) = N(l + m(u))/2, and then 

P(u) = q Ni(U)( 1 _ q s j N-N 1 (U) 

N{l+m(U))/2 



2N 1 (u)/N - 1. 



1 



[q(l - q)] N " 



1 



Nm(U))/2 



N/2 Nm{U)H 



(9) 



where H is defined as above. By the same token, for the binary symmetric channel we have: 
P(y\x)=p dB ( x >V\l-p) n - d »( x M = (i- p )n e -Bd H (x,y) 

where B = In an( \ djj(x, y) is the Hamming distance between x and y, namely, the number of 
places {i} where yi^ X{. Thus, 



Z,(P,H) = [q(l - q)]"! 1 ' 2 ^ 



E 



-PHI/ P{y\x{u))} 



X(U): m{U)=m 



N/3mH 



[ q (i- q )f N/2 (i- P rE . 

'_X{U): m(U)=m 



E 



-f3Bd H {X{U),y) 



f3NmH 



(10) 



where the resemblance to eq. © is self evident, with Q{(3,m) being redefined as the second brack- 
eted term. In analogy to the above analysis of the REM, (,{(3, m) here behaves like in the REM 
without a magnetic field, namely, it contains exponentially e Nh ({ 1 + m )/' 2 ) = e nh((i+m)/2)/e ^ ermS; 
with the random energy levels of the REM being replaced now by random Hamming distances 
{d}{(x(u),y)} that are induced by the random selection of the code {a^-a)}!^ Using the same 
considerations as with the REM (see also [5]), (((3,m) can be represented as Yls Ny^ m (5)e~ l3BnS , 
where Ny^ m (5) is the number of vectors {u} with m(u) = m and djj{x(u),y) = nd. Since Ny tTn (5) 
is the sum of e nh ^ 1+m ^ 2 ^ 9 many i.i.d. binary random variables of the form l{dji(x(u),y) = 
nd} (again, with randomness induced by the random selection of x(u)), each with expectation 
given by Pr{G?#(a?(«), y) = nd} = e n [ ft -(< 5 )- ln2 ] ; then Ny^ m (d) is typically zero for all d such that 



Of course, the channel output vector y is also random, but this randomness does not play any essential role here. 
This discussion applies as well for every given y. 
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h((l + m)/2)/0 + h(5) - In 2 < 0, and is typically around its expectation, e n[fc((i+m)/2)/fl+h(*)-in2] j 
for all 5 such that + m)/2)/0 + h{5) - In 2 > 0. 

Defining now the Gilbert- Varshamov distance 6gv{R) [3 Chap. 6] as the solution 5 < 1/2 to 
the equation h(5) = In 2 — R, the condition h{{\ + m)/2)/8 + h{8) — In 2 > is equivalent to the 
condition SGv(h((l + m)/2)/9) < 5 < 1 — <5oy(/t((l + m)/2)/6). Thus, for a typical randomly 
selected code, 

ln£(/3, m) 



<^(/3, m) = lim 



n— >oo 7J, 

max 

5e[fcv(ft((l+m)/2)/0),l-(5 G v(ft((l+m)/2)/0] 



l /t (l±m)_l n2 + % /3 )-^ p/3 > fcy (l/,(i±-)) 



where pp = p@ / (p^ + (1 — p)P). The condition > dcvi^h^-^ 11 )) is equivalent to the condition 



/3<A)(m) = ^ln 



l-(fcy(/i((l + ro)/2)/0) 



<W(/*((l+m)/2)/0) 
The exponential order of ^ £(/3, m)e Nl3mH , as a function of iV is then 



^(A^)=hm lln 

N^oo iv 



^C(/3,m)e^ 



max[6*(/>(/?, m) + (3mH] . 



For small enough /3, the dominant value of m is the one that maximizes [h((l + m)/2) + f3mH], 
namely, the well-known paramagnetic magnetization m = m p (P,H) = tanh(/?if). This is true as 
long as P < Po(tanh(PH)). Consider then the equation 

P = p {tanh(PH)) 

where the unknown is /?, or equivalently, the equation 

l„2- ft (p e ) = i^ 1+tmh <^A 



2 J 

Now /i((l + tanh(/3//))/2) is decreasing with P, while [In 2 — h(pp)] is increasing. At P = 0, 
In 2 - h{pp) = whereas /i((l + tanh(/?i2"))/2)/0 = In 2/9. As p -> oo, In 2 - /i(^) In 2 
whereas /i((l + tanh(/3ff))/2)/(9 — > 0, provided that H ^ 0. Thus, for H ^ 0, there must be a 
unique solution, which we shall denote by P pg (H), where the subscript "pg" stands for the fact 
that this is the boundary curve between the paramagnetic phase and the glassy phase. Since 
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h((l +tanh(PH))/2))/8 is decreasing with \H\, pg (H) is decreasing in \H\, i.e., the temperature 
Tpg(H) = l/P pg (H) is increasing in \H\, as before (see Fig. [2j). As for the case H = 0, for 9 > 1, 
we have 

1 T l- 1/0) Infl - 
ow- 5 [ h-i((l-l/e)ln2) 

For < 8 < 1, /3 P g(0) = oo, namely, T p9 (0) = 0, which means that there is no phase transition 

as the behavior is paramagnetic at all temperatures. In the same manner, it is easy to see that 

f3 pg (oo) = for all 9 > 0, which is another case where there are no phase transitions, but this time, 

it is a glassy behavior at all temperatures. 

As long as (5 < (3 pg (H), we have 



A h ^ l + tanh(/3ij-) ^ _ e(ln2 _ ^ + + 0Htanh ^ H) 



iP([3,H)=^ p (P,H)^h 



On the other hand, for > (3 pg (H), the system is in the glassy phase. In this case, 

mH-9B5 GV ( \h 1 +m 



il>(8,H) =if) g (B,H) = /?max 

m 

thus, the maximizing m depends only on H but not on (3. In this case, we have m = m g (H) 
tanh((3 p g(H) ■ H) and so 



il> B (J3,H) = (3 



H tanh(P pg {H) ■ H) - B95 GV -h 



1 fl + tanh(P pg (H)-H) 



p[Htaim(P pg (H)-H)-B9 P(3p 



.(H) J 



:i2) 



The free-energy density associated with erroneous messages is therefore given by 

A .. In Z e (f3,H) 1 ^(P,H) 
F e ((3,H) = - hm — = --ln[g(l - q)\ - 01n(l -p) 

N^oo I\ p 2 p 



i.e., 



where 



. ,,. . F P (J3,H) (3<f3 pg (H) 
j_i F g (H) P>(3 pg (H) 



M l + tanh03fl) 1 _ g(ln2 _ 



+9Bpp-Htanh(f3H) 



and 



F g {H) = -- \n[q{\ - q)] - 01n(l - p) - tanh(/3 ra (tf ) • H) + B9 Vf3pg{H) . 
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The boundary between the ferromagnetic phase (where Z c (f3) is the dominant term in Z{(3)) 
and the glassy phase is the vertical line (see Fig. [2]) H = Hj g , where Hf g is the solution to the 
equation 

f + m - -i.n [9 (l - - ln(l - ,) - "^WH) + BpMH) 

which after rearranging terms becomes 

Htanh(H) „ H t&nh(f3 p JH) ■ H) 
B P q = B PMH) f , 

whose solution in turn is achieved when f3 pg (H) = 1, i.e., 

which is nothing but the boundary of reliable communication ([1]). Thus, 

Hf9 = \ ln Thf With q * = 1 -^ 1 (^(ln2-%)), 

where h~ l {-) is the inverse of the function h(-) in the range where the argument is in [0, 5]. The 
vertical line H = Hf g intersects the paramagnetic-glassy boundary curve T = T pg (H) at the triple 
point (H,T) = (Hf g ,l), namely, T pg (Hf g ) = 1. The ferromagnetic region, pertaining to correct 
decoding (where m = 2q-l = tanh(F)), is {(H, T) : \H\ > H fg , T < T pf (H)}, where T = T pf (H) 
is paramagnetic-ferromagnetic boundary curve (see Fig. [2]) given by the solution (3 = 1/T of the 
equation 

n „ (3Htanh(H) , „ „ . 1 / 1 + tanh(/3iT) \ (5 TT 

0pB - Z j±-L = In 2 + (3 Pf3 B - h{p p ) - -h I J - ^Htanh(^H) 

for every given which is larger than in absolute value. As can be seen, it also contains the 
point (H,T) = (H fg ,l). 

Discussion: We see that correct decoding occurs in a sufficiently strong magnetic field. This 
is not surprising as a strong magnetic field corresponds to a low-entropy source which can be 
transmitted reliably. The above exposition of the magnetization as a function of H and T is 
instructive for the understanding of typical error patterns in joint source-channel coding. At 
very low temperatures (like in word MAP decoding, which corresponds to (3 — > 00), the (sub- 
exponentially few) typical patterns of the erroneneously decoded vectors {u} have magnetization 
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dictated by the frozen phase, namely, m g (H) = t&nh.(f3 pg (H) • H), independently of the decoding 
temperature. For magnetic fields smaller than Hj g in absolute value (namely, for sources with 
high entropy), (5 pg {H) > 1, which means that the magnetization of a typical erroneously decoded 
sequence is higher than that of a typical (correct) source sequence which is mj = 2q — 1 = tanh(H). 
If the working temperature is lower than T pg (0), this remains true no matter how small \H\ is. If, 
on the other hand, T pg (0) < T < 1, then when the magnetic field is reduced, the magnetization 
of the (exponentially many) erroneously decoded vectors {u} is given by m p {f3,H) = tanh(/?i?), 
which is still higher than that of the typical source vector u, but now it is temperature-dependent. 
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Figure 2: Phase diagram of joint source channel coding: temperature vs. magnetic field. 

3 Ensemble Performance of Codes and Free Energies 

In this section, we provide bounds on the ensemble performance of joint source channel codes 
for the binary symmetric source and the binary symmetric channel. In particular, we examine 
the exponential decay rate of the average probability of correct decoding (the correct decoding 
exponent, for short) when the condition for reliable communication ([1]) is violated as well as the 
exponential decay rate of the average probability of error (error exponent) when this condition 
holds. As will be seen, the former is intimately related to the free energy in the glassy phase, 
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whereas the latter is strongly related to the free energy in the paramagnetic phase. 

The relationship between the previous derivations and both the correct decoding exponent and 
the error exponent stems from the fact both performance measures are bounded by expressions 
that are strongly related to the partition function Z e ((3). 

3.1 The Correct Decoding Exponent 

The probability of correct decoding pertaining to the word MAP decoder is well known (and can 
easily be shown) to be given by 



where with a slight abuse of notation, here £(/?, m) is redefined to include all messages {«}, including 
the correct one. Now, taking the ensemble average: 



^max[P( W )P(y !*(«))] 

y 



Vhm Y. p ^ u ) p ^y\< u )) 





(13) 




NmH 



Now, 




(14) 



5 
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where again, Ny^ m (5) is the number of codewords {x(u)}, corresponding to source words with 
m(u) = m, which fall at Hamming distance n5 from y. Now, as shown in [29], [361 Appendix], 



E{N y %(5)} 



exp{n [lh(^) +h(8) - In 2 



exp{ 



l h{^) + h{5) -In 2 



} 



1 l f 1+m 



) + /»(<*) < In 2 



//?} lh{^)+h(S)>ln2 



Thus, 



lim £{C 1//3 (/3,m)} = exp < n 



1 , + m 
-h 



In 2+ max |/i(5) - £<5j 

«<«Gv(h((l+m)/2)/fl) 



(15) 



(16) 



and so, 



max 

8<S GV (h((l+m)/2)/0) 



V rn 

{h{6) - B5} 



In 2+ 



NmH 



(17) 



The dominant m is the one that maximizes 



1 (1+m 



In 2+ max {/t(5) - B5\ + 

5<5Gv(ft((l+m)/2)/e) 6 1 

Now, if h(p) < In 2 — /i((l + m)/2)/9, then the inner maximization is attained at 5 = p and we get 



1 /l + tanh(iT)\ 



) 



In 2 + %) -Bp + 



iTtanh(iJ) 



If ft,(p) < In 2 — /i((l + tanh(.£f))/2)/#, namely, the condition for reliable communication holds, this 
indeed happens. In this case, we get 



P c = [q(l-q)} N/2 (l-p) n -2 n -exp 



n 



^-ln2 + %)-5p + ^ln. 



1, 



2(9 ~ 1 - q 

as expected. Otherwise, the maximum is attained at the boundary of the allowed range of 5, and 
we get 



max 



mH 



B5 GV -Ji 



1 , (1 + m 



H 



t a nh(P pg (HyH)-B5 GV -h 



1 { 1 + tanh(Ppg(H) ■ H 



and so, the correct decoding exponent is 

A 



hiP c 

E c = — hm 

TV^oo N 

-Il%(l- g )]-01n[2(l-p)] + 

^(>( 1 + tanh( ^ (g) ' g 
= F g {H)-dhx2. 



Htanh(p pg (H) ■ H) 



(18) 



19 



Thus, we obtained a very simple relationship between the correct decoding exponent and the glassy 
free energy. The ferromagnetic-glassy phase transition is exactly the transition from E c = to E c > 

0. The dominant magnetization of the correct decoding event is then m g (H) = taxih(/3 P g(H) ■ H), 

1. e., the dominant (rare) event of correct decoding is when the source vector u has the (non- 
typical) magentization m g {H). If the condition of reliable communication does not hold, i.e., 
h{q)/6 > In 2 — h(p), then the word MAP decoder {(3 — > oo) works in the glassy regime, but the 
symbol MAP decoder {(5 = 1) works in the paramagnetic regime. The computation of P c for the 
word MAP decoder is carried out also in the glassy regime. 

3.2 The Error Exponent 

We begin by using Gallager's techniques (see [311 Problem 5.16, pp. 534-535]): The probability of 
error for a given code and the word MAP decoder is given by 

Pe = Y Y P (y\ x ( u )) • X { 3 u ' '■ P{u')P{y\x(u')) > P(u)P{y\x(u))}. 
u y 

Now, it is easy to see that whenever an error occurs 

'P(u')P{y\x(u'))^ 



E 



P(u)P{y\x{u)) 



> 1. 



for every (3 > 0. Thus, 



1{3 u' : P(u')P(y\x(u')) > P(u)P(y\x(u))} < ^ 



P(u')P(y\x(u')) 
P{u)P{y\x(u)) 



for every p > 0. Substituting the right-hand side into the expression of P e , we get the following 
upper bound: 

Pe < Y Piu) 1 '^ Y Pivlxiu)) 1 -^ ( Y [P(u')P(y\x(u'))f ) 13 > 0, p > 0. (19) 
Thus, the average error probability over the ensemble of codes is bounded by 

P e <Y P ^" P Y E {P(y\X(u)) 1 -^} e[[y [P(u')P(y\X(u'))A 1 . (20) 
u y [ \w^u J J 
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In the binary symmetric case considered here, the first expectation is given by: 

n 

E[p{y\X{u)) 1 -^} = l^Y.XiP^i) 1 -^ 

X i=l 

= 2- n [p 1 -^ + (l-p) 1 -^] n 



(21) 



where 7(5) = In 2 — ln[p s + (1 — p) s ] . The second expectation is handled as follows. Using the above 
derived relation: 



Y,[P{u')P(y\X{u'))f = [q(l - q )] N ^a-p) n ^C(P,m)e^ NmH , 



w 



we get 



[P(u')P(y\X(u'))f 



= [9(1 - q)] NP/2 {l ~ P) n ? CO 9 ' m)eP NmH 

\ m / 

= [q(l - q)] N M 2 (l - p) n ^ Y, C P (P, m)e^ NmH 

m 

and so, assuming p £ [0,1], and using Jensen's inequality 



(22) 



E I 



Y [P(u')P(y\X(u'))f 



< [q(l - q)] N/3p/2 (l - p) n/3 ^[£{C(/?, m)}] p e' 3pmHN 

[ qi l _ q) ] W2(i _ p) n? P J- Y exp \ up \h f] ' '" 

m 8 

[q{l-q)] N P p/2 {l-p) npp X 

'1 f l+tanh(/?#)^ 



+ /i(<5)-ln2-/9B<5 



f3pmHN 



exp < np 



/3H 



tanh(/?tf) 



(23) 



2 j + h(p fi ) - In 2 - /35 P/3 + 

We see that the magnetization that dominates the Gallager bound is the paramagnetic magnetiza- 
tion. By plugging this expression back into the bound on P e , we get the error exponent: 
A 



E 



hlPe 

lim 

AT^oo N 



> _ i n[q i-pP + (i _ qf-pP] + [ 7 (i _ p 0) - In 2] - ^ ln[g(l - <?)] - f3 P 9 ln(l - p) 



h 



1 + tanh(/3#) \ 



+ 9[h( Pf3 ) - In 2 - ^Bp^g] + /3Htanh(PH) 



- lnfe 1 -^ + (1 - g) 1 -^] + 0[ 7 (1 - pP) - In 2] + p(3F p ((3, H) 
- - lnjb 1 -'" 3 + (1 - pf-^f ■ [q 1 ^ + (1 - q) 1 ^]} + p/3F p (/3, i?) 



(24) 
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Here, unlike in the computation of the correct decoding exponent, there is a mismatch between 
the phase in the H — T plane at which the decoder operatively works, and the phase at which 
P e is analyzed: While the former is ferromagnetic, the latter is paramagnetic regardless of the 
temperature. 
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